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^^ Abstract. Motivated by current trends in cloud computing, we study a version of the generalized assignment 

, problem where a set of virtual processors has to be implemented by a set of identical processors. For literature 

^— N consistency we say that a set of virtual machines (VMs) is assigned to a set of physical machines (PMs). The 

^sj optimization criteria is to minimize the power consumed by all the PMs. We term the problem Virtual Machine 

, Assigimient (VMA). Crucial differences with previous work include a variable number of PMs, that the VMs cannot 

O , be implemented fractionally (i.e., each VM must be assigned to exactly one PM), and a parametric minimum power 

consumption for each active PM. We show that the VMA problem is NP-hard in the strong sense and we present a 

VMA offline approximation algorithm. For this VMA protocol, we show the trade-off between the running time and 

*^D the approximation ratio achieved. Furthermore, restricting the VMA problem to realistic applications, we observe 

CN that such protocol is a PTA^for the VMA problem, while there is no FPTA^ Moving to online VMA algorithms, 

^_^ we show upper and lower bounds on the competitive ratio when only 2 PMs are available, lower bounds when some 

ry^ arbitrary number m of PMs are available, and an upper bound when the number of machines is unbounded. We also 

carry extensive simulations using real-world input such as Google cluster data. To the best of our knowledge, this 

is the first time the VMA problem is studied for this cost function. 
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1 Introduction 



q 
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,-H The current pace of technology developments, and the continuous change in business requirements, may 

C^ rapidly yield a given proprietary computational platform obsolete, oversized, or insufficient. Thus, out- 

sourcing has recently become a popular approach to obtain computational services. Furthermore, in order 
to attain flexibility, such service is usually virtualized, so that the user may tune the computational platform 
(^ to its particular needs. Users of such service need not to be aware of the particular implementation, but 

CO only of the specification of the virtual machine they use. This conceptual approach to outsourced computing 

. . has been termed cloud computing, in reference to the cloud symbol used as an abstraction of a complex 

. ^ infrastructure in system diagrams. Current examples of cloud computing providers include Amazon Web 

S^ Services Q, Rackspace [5], and Citrix 121 . 

Depending on what the specific service provided is, the cloud computing model comes in different fla- 
vors, such as infrastructure as a service, platform as a service, storage as a service, etc. In each of these 
models, the user may choose specific parameters of the computational resources provided. For instance, pro- 
cessing power, memory size, communication bandwidth, etc. Thus, in a cloud-computing service platform, 
various virtual machines (VM) with user-defined specifications must be implemented by, or assigned t(^ 

* This work is supported by the National Science Foundation (CCF-0937829, CCF-1 114930), Comunidad de Madrid 
grant S2009TIC-1692, MINECO grant TEC2011-29688-C02-01, and National Natural Science Foundation of China grant 
61020106002, MICINN grant Juan de la Cierva. 

^ Polynomial-Time Approximation Scheme 

* Fully Polynomial-Time Approximation Scheme 
^ The cloud-computing literature use instead the term placement. We choose here the term assignment for consistency with the 

literature on general assignment problems. 



various physical machines (PM^ Furthermore, such a platform must be scalable, allowing to add more 
PMs, should the business growth require such expansion. In this work, we call this problem the Virtual 
Machine Assignment (VMA) problem. (A precise definition is given in Section [TTT| ) 

From the previous discussion, it can be seen that, underlying VMA, there is some form of bin-packing 
problem. However, in VMA the number or capacity of PMs (i.e., bins for bin packing) may be increased 
if needed. The optimization criteria for VMA depends on what the particular objective function sought 
is. Previous work related to VMA has focused on minimizing the number of PMs used (cf. 1 1 1 1 and the 
references therein) to minimize energy consumption. However, power consumption is usually superlinear 
on the load of a given computational resource II10I19II . Hence, the use of extra PMs may be more efficient 
energy-wise than a minimum number of heavily-loaded PMs. On the other hand, the addition of a new PM 
to the system usually implies some fixed power-consumption increase [9 19], even if such PM is not loaded. 
In this work, we combine both power-consumption factors. That is, for some parameters a > 1 and 6 > 0, 
we seek to minimize the sum of the a powers of the PMs loads plus the fixed cost b of using each PM. 

The paper is organized as follows. First, in Section [TT] we include a formal definition of the problem. 



we overview related work in Section 1.2 and we detail our results in Section 1.3 Some preliminaries are 
included in Section [2] For the model described, we show that the VMA problem is NP-hard in the strong 
sense, even if a is fixed, in Section |3] Then, we present an offline VMA approximation algorithm in Sec- 
tion |4} which in fact is a PTAS for this problem for any realistic model of power consumption. We also 
study the competitiveness of online VMA algorithms (Section [5]l. For settings where a bounded number m 
of PMs are available, we show a lower bound on the competitiveness of online algorithms (with respect to 



an optimal offline assignment) in Section 5.1 We improve such bound in Sections 5.2 for settings where 



only 2 PMs are available. We also show upper bounds on the competitiveness of online algorithms for the 



setting where 2 PMs are available in Section [53] Finally, in Section 5.4 we show an upper bound on the 
competitiveness of online VMA algorithms when the number of VMs is not bounded. We also present and 
evaluate experimentally some heuristics in Section [6j 

1.1 Problem Definition 

We define the Virtual Machine Assignment (VMA) problem as follows: 

Input: A set S = {si, . . . , Sm} of m identical physical machines. Rational numbers a and b, where q > 1 
and b > 0. A set D = {di, . . . , d^} of n virtual machines. A function £ : D — )• M that gives the CPU 
load each virtual machine incurs. 

Output: A partition vr = {Ai, . . . , Am} of D. 

Objective function: Minimize the power consumption given by the function 

Pi-) = E 

«e[l,m]:Ai^0 

For convenience, we define the following notation. We overload the function £{■) to be applied over sets 
of virtual machines, so that i{Ai) = Yld (^A ^{dj)- Also, let us define the function /(•), such that f{x) = 
if x = and f{x) = x" + 6 otherwise. Then, the objective function is to minimize 




P{7^) = Y.f{l[Ai)). 



i=l 



' We choose the notation VM and PM for simplicity and consistency, but notice that our study applies to any computational 
resource assignment problem, as long as the minimization function is the one modeled here. 



1.2 Related Work 

To the best of our knowledge, previous work on VMA has been only experimental II16I26I3 112311 or has 
focused on different cost functions II15I17I7I11L First, we provide an overview of previous theoretical work 
for related assignment problems (storage allocation, scheduling, network design, etc.). The cost functions 
considered in that work resemble or generalize the power cost function under consideration here. Secondly, 
we overview related experimental work. 

Chandra and Wong |[T5l . and Cody and Coffman [17] study a problem for storage allocation that is a 
variant of VMA with 6 = and a = 2. Hence, this problem tries to minimize the sum of the squares of the 
machine-load vector. They study the offline version of the problem and provide algorithms with constant 
approximation ratio. A significant leap was taken by Alon et al. [7], since they present a PTAS for the 
problem of minimizing the Lp norm of the load vector, for any p > I. This problem has the previous one as 
special case, and is also a variant of the VMA problem when p = a and 6 = 0. 

Bansal, Chan, and Pruhs minimize arbitrary power functions for speed scaling in job scheduling flOl. 
The problem is to schedule the execution of n computational jobs on a single processor, whose speed may 
vary within a countable collection of intervals. Each job has a release time, a processing work to be done, a 
weight characterizing its importance, and its execution can be suspended and restarted later without penalty. 
A scheduler algorithm must specify, for each time, a job to execute and a speed for the processor. The goal 
is to minimize the weighted sum of the flow times over all jobs plus the energy consumption, where the 
flow time of a job is the time elapsed from release to completion and the energy consumption is given by 
s" where s is the processor speed and a > 1 is some constant. For the online algorithm shortest remaining 
processing time first, the authors prove a (3 + e) competitive ratio for the objective of total weighted flow 
plus energy. Whereas for the online algorithm highest density first (HDF), where the density of a job is its 
weight-to- work ratio, they prove a (2 + e) competitive ratio for the objective of fractional weighted flow 
plus energy. 

A generalization of the above problem is studied by Gupta, Krishnaswamy, and Pruhs in fl9]. The 
question addressed is how to assign jobs, possibly fractionally, to unrelated parallel machines in an online 
fashion in order to minimize the sum of the a-powers of the machine loads plus the assignment costs. Upon 
arrival of a job, the algorithm learns the increase on the load and the cost of assigning a unit of such job to a 
machine. Jobs cannot be suspended and/or reassigned. The authors model a greedy algorithm that assigns a 
job so that the cost is minimized as solving a mathematical program with constraints arriving onhne. They 
show a competitive ratio of a" with respect to the solution of the dual program which is a lower bound for 
the optimal. References to previous work on the particular case of minimizing energy with deadlines can be 
found in this paper. 

Recently, Im, Moseley, and Pruhs studied online scheduling for general cost functions of the flow time, 
with the only restriction that such function is non-decreasing II2T1 . In their model, a collection of jobs, each 
characterized by a release time, a processing work, and a weight, must be processed by a single server 
whose speed is variable. A job can be suspended and restarted later without penalty. The authors show that 
HDF is (2 + e)-speed 0(l)-competitive against the optimal algorithm on a unit speed-processor, for all non- 
decreasing cost functions of the flow time. Furthermore, they also show that this ratio cannot be improved 
significantly proving impossibility results if the cost function is not uniform among jobs or the speed cannot 
be significantly increased. 

Similar cost functions have been considered for the minimum cost network-design problem. In this 
problem packets have to be routed through a (possibly multihop) network of speed scalable routers. There is 
a cost associated to assigning a packet to a link and to the speed or load of the router. The goal is to route all 
packets minimizing the aggregated cost. In HI and ||9l the authors show offline algorithms for this problem 



that achieve polynomial and poly-logarithmic approximation, respectively, where the cost function is the 
a-th power of the link load plus a link assignment cost, for any constant a > 1. The same problem and cost 
function is studied in 1 19] (the assignment cost is omitted for clarity). As for the scheduling problem, the 
authors model a greedy algorithm as solving a mathematical program with constraints arriving online. They 
show a competitive ratio of a" with respect to the solution of the dual program which is a lower bound for 
the optimal. 

The experimental work related to VMA is vast and its detailed overview is out of the scope of this pa- 
per. Some of this work does not minimize energy II14I24I27II or it applies to a model different than ours 
(VM migration II29I30I . knowledge of future load II25I30II . feasibility of allocation lITTll . multilevel architec- 
ture P 26I29I221 |. interconnected VMs [1121 . etc.). On the other hand, some of the experimental work where 
minimization of energy is evaluated focus on a more restrictive cost function 11331221341 . 

In l22il . for an energy cost model that is linear, the authors evaluate experimentally the allocation of VMs 
to clusters following 7 placement policies, some of them included in popular cloud platforms II4I3II . Namely, 
Round Robin, Striping, Packing, Load Balancing (free CPU count). Load Balancing (free CPU ratio). Watts 
per Core, Cost per Core. We adapt 5 of these policies (defined later) to our model and cost function for the 
purpose of simulations. 

In ||30l . the authors focus on an energy-efficient VM placement problem with two requirements: CPU 
and disk. These requirements are assumed to change dynamically and the goal is to consolidate loads among 
servers, possibly using migration at no cost. In our model VMs assignment is based on a CPU requirement 
that does not change and migration is not allowed. Should any other resource be the dominating energy 
cost, the same results apply for that requirement. Also, if loads change and migration is free, an offline 
algorithm can be used each time that a load changes or a new VM arrives. In BOl it is shown experimentally 
that energy-efficient VMA does not merely reduce to a packing problem. That is, to minimize the number 
of PMs used even if their load is close to their maximum capacity. For our model, we show here that the 
optimal load of a given server is a function only of the fixed cost of being active (b) and the exponential rate 
of power increase on the load (a). That is, the optimal load is not related to the maximum capacity of a PM. 

1.3 Our Results 

In this work, we study offline and online versions of the VMA problem. First, using a reduction from 3- 
PARTITION, we show that VMA is NP-hard in the strong sense, even if a is constant. This result implies 
that the VMA problem does not have a fully polynomial-time approximation scheme (FPTAS), even if a is 
constant. Then, we present a VMA algorithm that achieves an approximation ratio of (1 + e)" with respect to 
the power consumption of an optimal assignment, for any constant e > 0. We also show a 0(min(n, m){n + 
g{l/e) ■ log*^*^^^ n)) upper bound on the running time of such algorithm, where <?(•) is some function that 
grows at least exponentially. We observe that, when a is constant as in any realistic power consumption 
model, this algorithm is a polynomial-time algorithmic scheme (PTAS) for the VMA problem. Hence, for 
constant a, we fully characterize the offline version of the VMA problem, since a PTAS is presented and no 
FPTAS may exist. 

Then we move on to online VMA algorithms. That is, we assume that VMs are revealed to the al- 
gorithm one by one, and the assignments made by the algorithm are final. First, we show that, when the 
number of PMs is bounded, no online VMA algorithm is 3"/(2""'"^ + e) -competitive for any e > 0, 
and we show a stronger lower bound of 3°/2°+^ for a system where only 2 PMs are available. Mov- 
ing to upper bounds, for a system with only 2 PMs, we present an online algorithm that is optimal when 
^{D) < (&/(2" — 2)) '" and achieves a competitive ratio of at most max{2, (3/2)"" } otherwise. Finally, 
for a system where PMs may be added on demand, we present a VMA online algorithm that, if no VM di 



has i{di) < X* (where x* = {b/{a — 1)) '") achieves optimal competitive ratio 1. Otherwise, it achieves a 
2°-i + x*/{J2d,.^^(^d,)<^* i{di)) competitive ratio. 

2 Preliminaries 

The following observations will be used in the analysis. We callpower rate the power consumed per unit of 
load in a PM. Let x be the load of a PM. Then, its power rate is computed as f{x)/x. The load at which the 
power rate is minimized, denoted x*, is the optimal load, and the corresponding rate is the optimal power 

rate p* = f{x*)/x*. Using calculus we get: 

Observation 1 The optimal load is: 

X* = {b/{a - 1))^/° . 

Equivalently, for any x 7^ x*, f{x)/x > p*. 

Lemma 1. Given an instance of the VMA problem with a set of VMs D = {di, . . . , dn}, any solution 

vr = {Ai, . . . , Am} where J2deA- ^ 7^ x* for some i E [1, "'^i], satisfies 



p{^)>p*i{D) = p*Y,m- 

Proof. The total cost of vr is Ylieli ml /(^(^«)) which, from Observation [T| is at least 

j; e{A,)p*=p* Yl Y.^^d) = p*Y.i{d). 

Corollary 1. Given an instance of the VMA problem with VMs D = {di, . . . , dn}, any solution n satisfies 

^(vr)>P*Er=i^K)- 

3 NP-hardness of VMA 

In this section we show that the VMA problem is NP-hard in the strong sense. 
Theorem 1. VMA is strongly NP-hard, even if a is constant 



Proof. We show a reduction from 3-Partition defined as follows fT8l , which is strongly NP-complete. 

INSTANCE: Set A of 3m elements, a bound B G Z+ and, for each a € A, a. size s{a) G Z+ such that 
B/A < s{a) < B/2 and Y^aeA ■«(«) = ^B- 

QUESTION: can A be partitioned into m disjoint sets {Ai, A2, . . . , Am} such that J2a£A ■*(^) ~ ^ 
for each 1 < i < ml 

The reduction is as follows. Given an instance of 3-Partition on a set ^ = {ai, . . . , a^m} with bound B, 
and given a fixed value a > 1, we define an instance X of VMA as follows: D = {oi, . . . , 03^}, ^(O = s{-), 
\S\ = m, and b = B°'{a — 1). We show now that the answer to the 3-Partition problem is YES if and only if 
the output vr = {Ai,A2,.. .,Am} of the VMA problem on input X is such that X^™^ /(£(Aj)) = mf{B). 

For the direct implication assume that there exists a partition {Ai, A2, . . . , Am} of A such that for 
each 1 < i < m, YlaeA ^i^) ~ ^- Then, in the context of the VMA problem, such partition has cost 
Yl^i /(^(^«)) = "mfiB). We claim that any partition has at least cost 7nf{B). In order to prove it, assume 
for the sake of contradiction that there is a partition vr' = {A[,A'2, . . . , A'm} of VMA on input I with cost 



less than mf{B). Then, there is some i € [1, m] such that 1{A'^) ^ B. From Lemmafl] P(vr') > p*£{D) = 
{f{x*)/x*)mB. Since B = x*, we have that P{7r') > mf{B), which is a contradiction. 

To prove the reverse implication, assume an output vr = {Ai,A2, . . . , Am} of the VMA problem on 
input I such that P{tt) = YaL^ f{i{Ai)) = mf{B). Then, it must be Vi G [l,m\,£{Ai) = B. Otherwise, 
fromLemmafTl P(vr) > mf{B), as shown above. 

It is known that strongly NP-hard problems cannot have a fully polynomial-time approximation scheme 
(FPTAS) ||32| . we have the following corollary. 



Corollary 2. VMA does not have a fully polynomial-time approximation scheme (FPTAS), even if a is 
constant. 

Observe that the problem remains NP-hard when m is fixed to 2. The proof uses a simple reduction 
from the partition problem, in which it is decided whether a multiset of integers can be partitioned into two 
subsets of equal sum. 

4 Off-line Approximation Algorithms 

In this section we show that there are algorithms to approximate the optimal off-line solution of VMA. 
Furthermore, if a is a constant there is a PTAS for the problem. 

Theorem 2. For every constant e > 0, there is an algorithm for the VMA problem with approximation ratio 

{1 + e)°^ and time complexity 0{uim{n,m) ■ {n + g{l/e) -log ^ ' n)), for some function g{-) that grows at 
least exponentially. 

Proof. We prove our claim providing the VMA algorithm and showing its approximation ratio and running 
time. Our algorithm uses the algorithm proposed by Alon et al. Q for the Norm Minimization problem, 
whose input is similar to VMA, and where the objective is to minimize the La norm of the PMs load. 
Translated to our notation, the Norm Minimization problem is to find the partition vr = {Ai, . . . , Am] of D 

that minimizes iV^(7r) = (E«=i ^(^i)")^^"- 

In their work, for any constant e > 0, they present a Norm Minimization algorithm that achieves an 
approximation ratio of 1 + e in time 0{n + ^(l/e) log '^^ n), where g{-) is a function that grows at least 
exponentially. 

Given the set D of VMs, let us define the multiset D' = {i{di)\di G D}. And let us denote by 
norm(L'', e, a, m) the partition found by the Norm Minimization algorithm, for any D' , e, a, in parameters 
given. The VMA algorithm is as follows. First, it executes 7 = min(n, m) instances of the norm algorithm, 
each with a different number of PMs. (Note that there cannot be more than n PMs used.) In particular, we 
obtain partitions vri, . . . , vr^, where ttj is the partition found by the execution of norm(D', e, a, i). Then, the 
VMA algorithm outputs the partition ttj that minimizes P(7rj), for i e {1, . . . , 7}. 

Let us assume that the optimal solution of the VMA problem, denoted vr*, uses k PMs. Observe that vr* 
also minimizes Nk{-). Hence, since vrfc = (^1, . . . , Ak) gives an approximation ratio of 1 + e to the optimal 
solution of the Norm Minimization problem, we have that 

P(7rfc) = k-h+ {Nk{7Tk)r < fc • 6 + ((1 + e)iVfc(vr*))° < (1 + e)"P(7r*). 

Since the VMA algorithm defined outputs the partition that makes -P(vrj) smallest, the approximation 
bound follows. Given that the VMA algorithm is composed by a number of min(n, m) calls to the norm 
algorithm, the claimed time complexity follows. 



When a is a constant, this result provides a PTAS for the VMA problem, by simply choosing for each 
constant 5 > an appropriate constant e such that 1 + (5 = (1 + e)", and using this new value in the above 
theorem. 

Corollary 3. When a is constant, there is a polynomial-time approximation scheme (PTAS) for the VMA 
problem. 

5 Competitiveness of Online Algorithms 

In this section we study the online version of VMA. First, we prove a lower bound on the competitiveness of 
any online algorithm in a system with a bounded number m of PMs, and a stronger lower bound for m = 2. 
Later, we present algorithms with bounded competitiveness for systems with 2 PMs and with unbounded 
number of PMs. 

5.1 Lower Bound for Bounded Number of Physical Machines 

We show in this section that for m PMs there is a general lower bound on the competitive ratio of 3"/(2"+^+ 
e), for any e > 0. Let us first prove the following lemmas. 

Lemma 2. Let x* < li < £2- Then /{h + £2) > f{h) + f{h)- 

Proof. Let assume that £1 = 6x* and £2 = {6 + e)x*. Note that 6 is at least 1 and e is not negative. Using 
that X* = {b/{a — 1))^/", the claim to be proven becomes 

{Sx* + {6 + e)j;*)° + b> {6x*)'^ + {{6 + e)x*)'^ + 26 
(26 + e)° > 5" + ((5 + e)° + a - 1 

From the convexity of g{x) = x°, we have that (26 + e)" - (25)" > {5"' + {6 + e)") - (<5" + 6"). Hence 
it is enough to have (25)'^ > 25'^ + a — 1, which is true for any 6 > I. 

From this lemma, it follows that if a PM has at least 2 VMs, each with load larger than x*, and there 
are unused PMs, the power consumption can be reduced by moving one VM to an unused PM. When this is 
done in a given partition we say that we are using Lemma |2] 

Lemma 3. Let L > and £2 < ^1 < L/2. Then f{£i) + /(L - £1) < /(^s) + f{L - £2). 

Proof. Let assume that £1 = 61L and £2 = S2L. Note that 62 < Si < 1/2. Then, the claim to be proven 
becomes 

{6iLr + ((1 - di)Lr < {d2Lr + ((i - ^2)1^ 

which holds because the function g{x) = x° + (1 — x)" is decreasing in the interval [0, 1/2). 

This lemma carries the intuition that balancing the load among the used PMs as much as possible reduces 
the power consumption. 

Theorem 3. When the number of PMs is m (bounded), no online VMA algorithm can achieve a competitive 
ratio o/3"/(2"+2 + e), for any e > 0. 



Proof. We prove the result by giving an adversarial arrival of VMs. We evaluate the competitive ratio of any 
online algorithm ALG with respect to an algorithm OPT that distributes the VMs among all the PMs "as 
evenly as possible". We define a value /3 > 1 such that e > (q — l)//3" for some value e > 0. Note that 
such value /3 can be defined for any e > 0. The adversarial arrival follows. In a first phase, m VMs arrive, 
each with load /3x* . 

Let vr be the partition given by ALG. We show first that if vr uses less than 3m/4 PMa^or some PM is 
assigned more than 2 VMs there exists another partition that can be obtained from vr, it uses exactly 3m/4 
PMs, no PM is assigned more than 2 VMs, and the power consumption is not worse. 

If vr uses less than 3m/4 PMs, then there exists another partition vr' that uses exactly 3m/4 PMs with a 
power consumption that is not worse than i-*(vr). To see why, notice that there are PMs in vr that are assigned 
more than one VM and that each load is fix* > x*. Then, applying repeatedly Lemma |2] until 3?ti/4 PMs 
are used, where li and C2 are the loads of any pair of VMs assigned to the same PM, a partition vr' such that 
P(vr') < P(vr) can be obtained. 

If in vr' some PM is assigned more than 2 VMs, then there exists another partition vr" where no PM is 
assigned more than 2 VMs with a power consumption that is not worse than -P(vr'). To see why, consider 
the following reassignment procedure. Repeatedly until there is no such PM, locate a PM Si with at least 
3 VMs. Then, locate a PM Sj with one single VM (which exists by the pigeonhole principle). Then, move 
one VM from si to Sj. From Lemma Is] each movement decreases the power consumed. Hence, vr" is still a 
partition that uses 3?n,/4 PMs, each PM has at most 2 VMs assigned, and P(vr") < P(vr'). 

Then, we know that P(vr) is not smaller than the power consumption of a partition where exactly 3?tt./4 
PMs are used and no PM is assigned more than 2 VMs. On the other hand, OPT would have assigned each 
VM to a different PM. Thus, using that x* = {b/{a — 1))^'", the competitive ratio is 

^ (2/3x*)"m/4 + (/3x*)"m/2 + 3m6/4 ^ 2"-2^" ^ 3 
~ m{Px*)a + mb ~ /3" + (a — 1) ~ 

where the last inequality follows from /3" > (a — 1). Finally, observe that 2"^^ > 3^/(2'^"'"^ + e) for a > 1. 
No more VMs arrive in this case. 

Let us consider now the the case where ALG assigns the m initial VMs to more than 3r?T,/4 PMs. Then, 
after ALG has assigned the first m VMs, a second batch of 7n/2 VMs arrive, each VM with load 2(3x*. Let 
vr be the partition output by ALG after this second batch is assigned. If in vr two of the second batch VMs 
are assigned to the same PM Sj, by the pigeonhole principle there is at least one PM Sj with at most load 
f3x*. Then, from Lemma [3j the power consumed is reduced if one of the new VMs is moved from Sj to Sj. 
After repeating this process as many times as possible, a partition vr' is obtained where each of the VMs of 
the second batch is assigned to a different PM, and -P(vr') < -P(vr). Since ALG used more than 3r7T,/4 PMs 
in the first batch, in vr', there are at least m/A PMs with load 3/3x*. On the other hand, OPT can distribute 
all the VMs in such a way that each PM has a load of 2/3x* . Thus, the bound on the competitive ratio is as 
follows. 

m(3/3x*)"/4 3" 

P > — tttt; — ^ 7 > 



m(2/3x*)° + mb ~ 2°+2 + e ' 
where the last inequality follows from e > (a — l)//3". 



' For clarity we omit floors and ceilings in the proof. 



5.2 A Stronger Lower Bound for 2 Physical Machines 

We show in this section that that the above lower bound can be made stronger for m = 2. 

Theorem 4. There is no online VMA algorithm that achieves a competitive ratio o/3"/2"+^, ifm = 2. 

Proof. We prove the resuh by showing an adversarial arrival of VM. We evaluate the competitive ratio of 
any online algorithm ALG with respect to an optimal algorithm OPT that knows the future VM arrivals. 
The adversarial arrival follows. In a first phase two VM di and d2 arrive, with loads £{di) = ^(^2) = 6x* 
(Recall from Section |2| that x* = {b/{a - 1))^/"). 

If ALG assigns both VM to the same PM, the power consumed will be ( 1 2x* ) " + 6, whereas OPT would 
assign them to different PMs, with a power consumption of 2((6x*)" + b). Hence, the ratio p would be 

(12x*)" + 6 12" 12" 6° 

P=;7777 ^ 7T > 7T77^ TT > ~ 



2((6x*)" + 6) 2(6° + a-l) 2(6° + 2") 2(3" + 1)' 

where the first inequality follows from a > 1 and the second from a — 1 < 2" for any a > 1. It is enough 
to prove that 6"/(2(3" + 1)) > (3/2)" /2, or equivalently 4" > 3" + 1, which is true for any q > 1. Then, 
there are no new VM arrivals. 

If, otherwise, ALG assigns each VM di and ^2 to a different PM, then a third VM ds arrives, with load 
i{ds) = 12x*. Then, ALG must assign it to one of the PMs. Independently of which PM is used, the power 
consumption of the final configuration is (18x*)" + (62;*)" + 25. On its side, OPT assigns di and ^2 to one 
PM, and ^3 to the other, with a power consumption of 2((12x*)" + b). Hence, the competitive ratio p is 

_ (18x*)" + (6x*)" + 26 18" + 6" 18" + 6" 



2((12a;*)" + 6) 2(12" + a - 1) 2(12" + 4<^ 

where the first inequality follows from a > 1, the second from a — 1 < 4" for any a > 1, and the third 
from (9" + 3")/(6" + 2") > (3/2)", what can be checked to be true. Then, there are no new VM arrivals 
and the claim follows. 

5.3 Upper Bound for 2 Physical Machines 

In this section we present a VMA algorithm (detailed in Algorithm [T]) and show an upper bound on its 
approximation ratio. The algorithm is online, that is, the VMs are revealed to the algorithm one by one. Ai 
and A2 are the sets of VMs assigned to PMs si and S2, respectively, at any given time. 



Algorithm 1: Online VMA algorithm for m = 2. 

for each VM di do 

if £{d,)+e{Ai) < (fe/(2" - 2))'/" or£(Ai) < £(^2) then 

I di is assigned to si 
else 

I di is assigned to S2 



We prove the approximation ratio of Algorithm [T] in the following theorem. 



Theorem 5. For a system where m = 2, there exists an online VMA algorithm that achieves the following 
competitive ratios. 

p=l,fori{D)<[^''^" 

p<max{2,['^] \,fori{D)>^ ;> ^ ^ " 



2"-2 



Proof. Consider Algorithmll shown above. lf£{D) < (6/(2" — 2)) '", then the competitive ratio is 1 as we 
show. Algorithm [T] assigns all the VMs to PM si. On the other hand, the optimal offline algorithm also as- 
signs all the VMs to one PM. To prove it, it is enough to show that i{D)°' + b < £(^i)" +£(^2)° + 26. Using 
that i{Aiy + £(yl2)" > 2 (£(L>)/2)" and manipulating, it is enough to prove i{D) < 2 (6/(2" - 2))^^'^. 
This is true for i{D) < (6/(2° - 2))^/". 

We consider now the case (6/(2" - 2))^/" < e{D) < 2 (6/(2" - 2))^/". Within this range, for the 
optimal algorithm is still better to assign all VMs to one PM, as shown. Then, the competitive ratio p is 

^ ijA^r + ^(-42)" + 26 ijor + 26 
^ e(D)" + b - e{D)" + b ^ ^ 

Consider any given step after i{D) > 2 (6/(2" — 2)) '". Within this range, the optimal algorithm may 
assign the VMs to one or both PMs. If the optimal algorithm assigns to one PM, Inequality [T] applies. 
Otherwise, the competitive ratio p is 

^ e{A,r+e{A2r+2b ^ £{A,r+e{A2r ^ ,e{A^rm2r + i_ 

^ 2{l{D)/2Y + 2b - t{DY {£{Ai)/i{A2) + lY' 

Then, in order to obtain a ratio at most x"/2, where x will be set later, it is enough to guarantee 

1 i{A,r/i{A2r + 1 < x;^ 
(£(Ai)/^(^2) + 1)" - 2 

{i{A{)/l{A2)r + l ^ (X 



{i{Ai)/l{A2) + ir - V2 

Without loss of generality, assume £{Ai) < ^(^2). This implies that {l{Ai) / i{A2)Y < e{Ai)/e{A2). 
Then, it is enough to have 

e{Ai)/£{A2) + l ^(x^- 



{t{Ai)/l{A2) + lY -\2, 
Let us now define £{Ai) + i = ^(^2) for some £ > 0. Manipulating and replacing, it is enough to show 

< ,_.\!uL. ■ (2) 



^(^1) - (2/x)"/(°-i) - 1 
If Inequality [2]holds the theorem is proved. Otherwise, the following claim is needed. 
Claim. lii{D) > 2 (6/(2" - 2))^/", then there must exist a VM d^ in Z) such that ^(dj) > \i{A2)-i{Ai)\. 
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Proof. If ^(^2) = ^(^1) the claim follows trivially. Assume that £(^2) 7^ ^(^i)- Consider any given 
time when 1{D) > 2 (6/(2" — 2)) '". For the sake of contradiction, assume that for all di G D it is 
i{di) < \i{A2) — £{Ai)\. Let di, d2, ■ ■ ■ , dr be the order in which the VMs were revealed to Algorithmfl] 
And let the respective sets of VMs be called Di = {dj\j G [1, i]}, that is Dr = D. Given that i{D) > 
2 (6/(2° - 2))^/" > (6/(2" - 2))^/", the VM dr was assigned to the PM with smaller load. Then, either 
e{dr) > \i{A2) - i{Ai)\ which would be a contradiction, or if £{dr) < \i{A2) - £{Ai)\ the PM with 
the smaller load before and after assigning dr is the same. The argument can be repeated iteratively back- 
wards for each d^-i, dr-2, etc. until, for some j £ [^,r), either it is £{dj) > \i{A2) — £{Ai)\ reaching 
a contradiction, or the total load is £{Dj) < (6/(2" — 2)) '". If the latter is the case, we know that for 
i € [l,j] every di was assigned to si. Recall that for i G {j,r] each di was assigned to the same PM. 
And, given that dj+i is the first VM for which the total load is at least (6/(2" - 2))^^", that PM is S2. 
But then, we have i{A2) < i{Ai) < (6/(2" — 2))^'", which is a contradiction with the assumption that 
^(D)>2(6/(2"-2))^/". 

Using Claim [53] we know that there exists a di in the input such that 



2 - ('2/r'l"/^""-^) 

i{di) > i > i{Ai) ^ %7 ^, . 

^(2/^)"/("-i)_l 

a — 1 

From the latter, it can be seen that if x > 2(3/4)"^, then we have that £ > 2i{Ai). Then, the competitive 
ratio p is 

£(Ai)" + (£(Ai) + ^)" + 26 ^ l{AiY + (^(^1) + ^)" 



"^ (2^(^i))" + ^" + 26 - (2^(^i))" + ^" 

Using calculus, this ratio is maximized for £ = 2l{Ai) for I > 2l{Ai). Then, we have p < (1 + 3")/(2 • 2"). 
Then, in order to obtain a ratio at most x"/2, it is enough to guarantee (1 + 3")/(2 • 2") < x"/2 which 
yields x>{{l + 3")/2")^/". 

Given that, for any a > 1, it holds: 

2(3/4)1-1/" > ((l + 3")/2")i/". 

Then, the competitive ratio is p < (2(3/4)i-V")"/2 = (3/2)"-i. 

5.4 Upper Bound for Unbounded Number of Physical Machines 

In this section we introduce an online VMA algorithm for the case when the number of PMs is unbounded. 
The algorithm uses the load of the new revealed VM in order to decide the PM where it is assigned. If the 
load of the revealed VM is larger than x*, the algorithm assigns this VM to a new PM without any other VM 
already assigned to it. Otherwise, the algorithm schedules the revealed VM to the most loaded PM whose 
current load is smaller than x*. Note that, since the case under consideration assumes the existence of an 
unbounded number of PMs, there always exists at least one PM whose current load is smaller than x* . A 
detailed description of this algorithm is shown in Algorithm |2] As before, Aj is the set of VMs assigned to 
PM Sj at a given time. 

We prove the approximation ratio of Algorithm [2] in the following theorem. 

Theorem 6. For a system with unbounded number of PMs, Algorithm^achieves a competitive ratio of 1 if 

no VM di has £{di) < x*, and o/2"~i + x* / '^d-fM)<x* ^{di)> otherwise. 
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Algorithm 2: Online VMA algorithm for unbounded number of PMs. 

for each VM dt do 
\i(.{dr) > x' then 

I di is assigned to a new PM 
else 

I di is assigned to thie PM Sj such that l{Ak) < i{Aj) < x* for all k 



Proof. Let us assume that we have an optimal algorithm, that is, an algorithm that gives an optimal solution 
for any instance. Let us denote by tt* the optimal solution obtained by the optimal algorithm, and Ai the 
load assigned to PM si in that solution, for a particular instance of VMA problem. Furthermore, load Ai 
is decomposed in di^,di^, . . . , di,,, , where each dj is a VM that vr* assigns to Si. Using simple algebra, it 
holds: 

It is possible now to split the set Ai in two sets, one with those VMs assigned to Sj whose load is strictly 
smaller than x* and a second set that contains those VMs assigned to Si whose load is bigger than x* . In 
terms of notation, we say that Ai is split in Bi and Si (where B stands for Big loads and S stands for Small 
loads). Therefore, it also holds: 

dij &Bi di. £Si 

On the other hand, by definition of x*, it holds that: 

fii{A,))/i{A,) > f{x*)/x* 

for all i (indeed, for any load). Moreover, if a PM has been assigned with a load £{di.) bigger than x*, it 
also holds that f{i{Ai))/£{Ai) > f{£{diA)/e{diA. Hence, we obtain the following inequality: 



/WA))> Yl /(^K))+ E ^^K)- 



di-diBi di aSi 



In order to lower bound the power consumption of the solution vr*, we plug the above inequality into the 
corresponding equation: 

p{^*) = E /(^(^^)) ^ E E /(^(^^.)) + ^ E E ^(^b). 

Ai^% A,^%d,<^Bi Ai^fUdi^&Si 



or, equivalently expressed in more compact notation: 



fi^*] 



p{7T*)> E /w^^)) + ^ E ^(^^)- 

di:£{di)>x* di:£(di)<x* 

Consider now Algorithm|2] Let us denote by vr a solution that Algorithm[2]gives for a particular instance. 
Also, let us denote by Ai the load assigned by Algorithm |2j to PM Sj. Note that due to the design of the 
algorithm, after the last VM has been assigned, either there is only one loaded PM whose current load is 
smaller than x* , or every loaded PM has a load bigger than x* . We study these two cases separately. 
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- Case when i{Ai) > x* for all i. In this case, in a solution provided by vr there are PMs with two types 
of load: those that are loaded with one VM whose load is bigger than x*, and those that are loaded with 
VMs whose load is strictly smaller than x* , nonetheless, their total load is bigger than x* . Note that due 
to the design of the algorithm, none of the PMs in the second group has a load bigger than 2x* . Let us 
denote by B the set of demands with load at least x* , and S the set of demands with load less than x* . 
Therefore, it holds: 

^(vr) = Yl f(^id)) + Y. /(^(^^)) ^ E /(^('^)) + ^^ E ^(^)- 

des x*<i{Ai)<2x* d.eB e{d£S 

Computing the ratio p between -P(vr) and i-*(vr*), we obtain the following inequality: 

- Case when there exists Si such that i{Ai) < x*. In this case, tt gives solutions with three types of loaded 
PMs: those that are loaded with one VM whose load is bigger than x*, those that are loaded with VMs 
whose load is strictly smaller than x* , but which total load is bigger than x* , and one PM whose total 
load is is strictly smaller than x*. Let us denote such a PM by s'. Therefore, it holds: 

P{^) = E f^^(d)) + Y /(^(^^)) + /(^(^^')) 

d(^B x*<e{Ai)<2x* 

< E f^^(d)) + :^^(^£(d) - A,,) + /(£(i,0) 

des ^ des 



Yf(^(d)) + ^-^(Ym-As')+({A,,r+b. 



2x 
deB des 

Let us denote the latter expression by -//(tt). Computing the ratio p between P(vr) and P{tt*), we obtain 
the following inequality: 

n(TT) 1 ^(i,/)"-i,/4^ + 6 

P < yr^, < 2"-^ + ^ ,, ^, ^" (4) 

EdeB fim) + ^ Edes m ^ Edes m 



^Edes^id)- i^Ed^sm T.d,sm- 

Since x* /Ylid&s ^(d) is always positive, the competitive ratio of Algorithm 2^is equal to 2"~^+x*/ Edes ^(d). 

(tt-) 



Observe that, when no VM dhas load i{d) < x* , i,e., 5 = 0, equations (3l and Ml become p^^L < 1. 



t 



6 Experimental Evaluation 

In this section we experimentally evaluate the performance of the online VMA algorithm we proposed in 
Section[5]4J extended to be able to handle a bounded number of PMs. Additionally, we compare it with other 
online placement algorithms. 
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(a) Trace A (synthetic traces) (b) Trace B (Google traces) 

Fig. 1. VMs load distributions used in the evaluations. 



6.1 Experimental Setup 

The online algorithm whose performance is evaluated here, which we call Algorithm VMA, behaves exactly 
as Algorithm [2] if possible. Otherwise, a new VM is assigned to the least loaded PM. 

The performance of Algorithm VMA is first compared with a lower bound, denoted LB VMA, that is 
obtained as follows. The input VMs are sorted in non-increasing order of their loads. Then, using this order, 
as many VMs as possible with load at least x* are assigned to different PMs. Let L be total load of the 
VMs still unassigned. If there are at least [L/x*\ PMs still unused, LB VMA uses exactly [L/x*\ PMs. 
Otherwise all PMs are used by LB VMA. Finally, the load L is assigned among all used PMs as if it could 
be infinitely divided (i.e., as a fluid), using a water-filling algorithm fTSl. 

Algorithm VMA is also compared with the following algorithms proposed in the literature. 

- Random Placement (RP) f2E\ : It chooses a PM for each VM uniformly at random. 

- Next Fit (NF) [26]: Starting initially at the first PM, each new VM is assigned to the next PM after the 
latest PM to which a VM was assigned (in a cyclic fashion). 

- Least Full First (LFF) |[26l : Each new VM will be assigned to (one of) the least loaded PM in the system. 

- Striping (S) 1,221 : Each new VM is assigned to (one of the) PM with the smallest number of VMs 
assigned. 

- Watts per Core (WC) ll22l : Assign each new VM to the PM whose power would suffer the smallest 
increase. 

The behavior of the aforementioned algorithms is evaluated by inputting two sets of traces, synthetic 
and real, shown in Figure [T] We call them Trace A and Trace B, respectively. Trace A is generated by 
randomly choosing the load of each VM following a power-law distribution with exponential cutoff (power- 
law distributions are similar to Zipf and Pareto distribution, cf. |6 28 1), which has been chosen so 100 is the 
maximum load of a VM. We select 10000 integer loads randomly using this distribution. This leads us to 



the VM load distribution shown in Figure 1(a) 



Trace B is obtained from public Google traces 1,20,1 . We extract all the tasks from these traces, assuming 
that each task is an independent VM. The VMs (tasks) are sorted by the time at which they join the system. 
The load of a VM is the maximum CPU load of the task. The trace then contains 124885 VMs with loads 



varying between 0.31 and 12.5. The resulting VM load distribution can be seen in Figure 1(b) 
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Fig. 2. Comparing the power consumed by VMA with the lower bound LBVMA for x* = 30 and a = {1.5, 2, 3}. 



Each execution of the algorithms is run with a fixed number of PMs. This number of PMs increases from 
1 to the number of VMs in the trace being used. This allows us to see how the power consumption and the 
algorithms behavior evolve when the available PMs in the system vary. 

We run simulations for all values of a G {1.5, 2, 3}, and x* G {1, 3, 10, 30, 100, 300, 1000, 10000}, for 
both traces and for all algorithms (including the lower bound). (Due to space restrictions we only present 
results for x* G {30, 100, 300}, which are the most interesting.) With these two parameters, b is also char- 
acterized and we can also study different situations, the cases in which the VM loads are larger, similar or 
smaller than x* . 

6.2 Experimental Results 

The results obtained are presented as graphs in which the power consumed is represented as a function of 
the number of PMs used. We start by using Trace A, comparing the resulting power consumed using our 
algorithm with the lower bound on the optimal power consumption given by LBVMA. This comparison is 
run for a fixed value of x* {x* = 30) and the different values of a considered, 1.5, 2, and 3. The results are 
shown in Figure |2] As it can be observed, there is no qualitative difference in the solutions when a varies. 
(Similar results are obtained with other values of x* and with Trace B.) As can be seen, the power consumed 
by the partitions found with VMA is very close to the lower bound obtained with LBVMA. This shows that 
the performance of VMA is close to optimal. 

We then compare both VMA and LBVMA with algorithms RP, LFF, NF, WC and S. Figure [3] shows 
the result of running these algorithms with Trace A and Trace B, for a = 2 and different values of x* 
(x* = {30, 100, 1000}). 

Observe that with Trace A and small values of x*, like x* = 30, all the algorithms present similar 
results, and VMA is not showing a significantly better performance than the rest. This is so because, either 
explicitly or implicitly, when x* is small in relation with the loads in the system, using a new PM is cheap, 
and all the algorithms (including VMA) essentially distribute the load evenly among the available PMs. 
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Fig. 3. Performance of the algorithms for Trace A and Trace B, q = 2 and different values of x* . 



However, Figures 3(b) and 3(c) already shows a change in the behavior due to the smaller size of the 
loads in Trace B, on the one hand, and the higher value of x* on the other. In these cases, WC and VMA, 
that have been designed to be energy-aware, take an advantage with respect to the other algorithms. In fact. 
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both algorithms reach a stable number of PMs that is enough to place all the available load, not requiring 
new PMs and, hence, not paying for the switching on cost of additional PMs (characterized by b). 

7 Conclusions and Open Problems 

In this paper we have studied a particular case of the generalized assignment problem with applications 
to Cloud Computing. We have considered the problem of assigning virtual machines (VMs) to physical 
machines (PMs) so that the power consumption is minimized, a problem that we call virtual machine as- 
signment (VMA). In our theoretical analysis we have shown that the VMA problem is NP-hard, we have 
shown a PTAS that solves VMA offline, and we have proved upper and lower bounds on the competitive 
ratio of VMA online algorithms. We have also carried out extensive simulations using synthetic data as well 
as real world inputs such as Google cluster data. The simulations show that in practice the performance of 
our online algorithms is very close to a lower bound on the offline optimal, and better than known techniques 
that are currently used. 

In our model we have assumed that customers specify a CPU requirement that must be guaranteed and 
will not change, that migration of VMs among PMs is not feasible, and that the service provider may increase 
the number of PMs at a cost. To the best of our knowledge, this model has not been studied previously 
from a theoretical standpoint. Other models in the experimental literature include more client requirements 
(such as speed, disk, memory, etc.), migration of VMs for free, dynamic change of loads, and/or different 
cost functions. Being parametric, our energy cost function generalizes other functions considered in the 
literature. For settings where the dominating cost is other than the CPU load, our results still apply tuning 
appropriately the cost function. If the migration of VMs is for free, our results also still apply running the 
offline approximation algorithm each time a load changes or a new VM arrives. We leave for future work the 
consideration of migration of VMs at a cost, and the combination of various resource requirements, none of 
which is a bottleneck. 
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